The Tum+tut+kul Approach to the 2nd Chime Challenge: Multi-stream Asr Exploiting Blstm Networks and Sparse Nmf
نویسندگان
چکیده
We present our joint contribution to the 2nd CHiME Speech Separation and Recognition Challenge. Our system combines speech enhancement by supervised sparse non-negative matrix factorisation (NMF) with a multi-stream speech recognition system. In addition to a conventional MFCC HMM recogniser, predictions by a bidirectional Long Short-Term Memory recurrent neural network (BLSTM-RNN) and from non-negative sparse classification (NSC) are integrated into a triple-stream recogniser. Experiments are carried out on the small vocabulary and the medium vocabulary recognition tasks of the Challenge. Consistent improvements over the Challenge baselines demonstrate the efficacy of the proposed system, resulting in an average word accuracy of 92.8 % in the small vocabulary task and an average word error rate of 41.42 % in the medium vocabulary task.
منابع مشابه
Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory
This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for contextsensitive Tandem feature extraction a...
متن کاملThe Munich 2011 CHiME Challenge Contribution: NMF-BLSTM Speech Enhancement and Recognition for Reverberated Multisource Environments
We present the Munich contribution to the PASCAL ‘CHiME’ Speech Separation and Recognition Challenge: Our approach combines source separation by supervised convolutive non-negative matrix factorisation (NMF) with our tandem recogniser that augments acoustic features by word predictions of a Long Short-Term Memory recurrent neural network in a multi-stream Hidden Markov Model. The performance of...
متن کاملThe Munich Feature Enhancement Approach to the 2nd Chime Challenge Using Blstm Recurrent Neural Networks
We present a highly efficient, data-based method for monaural feature enhancement targeted at automatic speech recognition (ASR) in reverberant environments with highly non-stationary noise. Our approach is based on bidirectional Long Short-Term Memory recurrent neural networks trained to map noise corrupted features to clean features. In extensive test runs, enhanced features are evaluated wit...
متن کاملCombining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise
We address the speaker independent automatic recognition of spontaneous speech in highly variable noise by applying semisupervised sparse non-negative matrix factorization (NMF) for speech enhancement coupled with our recently proposed frontend utilizing bottleneck (BN) features generated by a bidirectional Long Short-Term Memory (BLSTM) recurrent neural network. In our evaluation, we unite the...
متن کاملThe ICSTM+TUM+UP Approach to the 3rd CHIME Challenge: Single-Channel LSTM Speech Enhancement with Multi-Channel Correlation Shaping Dereverberation and LSTM Language Models
This paper presents our contribution to the 3rd CHiME Speech Separation and Recognition Challenge. Our system uses Bidirectional Long Short-Term Memory (BLSTM) Recurrent Neural Networks (RNNs) for Single-channel Speech Enhancement (SSE). Networks are trained to predict clean speech as well as noise features from noisy speech features. In addition, the system applies two methods of dereverberati...
متن کامل